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(54) Audiovisual information management system 



(57) Providing at least one description scheme. For 
audio and/or video programs a program description 
scheme (18) provides information (10) regarding the 
associated program. For the user a user description 
scheme (20) provides information (14) regarding the 
user's preferences. For the system a system description 



scheme (22) provides information regarding the system. 
The description schemes are independent of one 
another. Preferably, the program description scheme 
(18), user description scheme (20), and system descrip- 
tion scheme (22) are independent of one another. 
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Description 

BACKGROUND OF THE INVENTION 

[0001 ] The present invention relates to a system for managing audiovisual information, and in particular to a system 
for audiovisual information browsing, filtering, searching, archiving, and personalization. 

[0002] Video cassette recorders (VCRs) may record video programs in response to pressing a record button or may 
be programmed to record video programs based on the time of day. However, the viewer must program the VCR based 
on information from a television guide to identify relevant programs to record. After recording, the viewer scans through 
the entire video tape to select relevant portions of the program for viewing using the functionality provided by the VCR, 
such as fast forward and fast reverse. Unfortunately, the searching and viewing is based on a linear search, which may 
require significant time to locate the desired portions of the program(s) and fast forward to the desired portion of the 
tape In addition, it is time consuming to program the VCR in light of the television guide to record desired programs. 
Also, unless the viewer recognizes the programs from the television guide as desirable it is unlikely that the viewer will 
select such programs to be recorded. 

[0003] RePlayTV and TiVo have developed hard disk based systems that receive, record, and play television broad- 
casts in a manner similar to a VCR. The systems may be programmed with the viewer's viewing preferences. The sys- 
tems use a telephone line interlace to receive scheduling information similar to that available from a television guide. 
Based upon the system programming and the scheduling information, the system automatically records programs that 
may be of potential interest to the viewer. Unfortunately, viewing the recorded programs occurs in a linear manner and 
may require substantial time. In addition, each system must be programmed for an individual's preference, likely in a 
different manner. 

[0004] Freeman et al., U.S. Patent No. 5,861 ,881 , disclose an interactive computer system where si±>scnbers can 
receive individualized content. 

[0005] With all the aforementioned systems, each individual viewer is required to program the device according to 
his particular viewing preferences. Unfortunately, each different type of device has different capabilities and limitations 
which limit the selections of the viewer. In addition, each device includes a different interface which the viewer may be 
unfamiliar with. Further, if the operator's manual is inadvertently misplaced it may be difficult for the viewer to efficiently 
program the device. 

SUMMARY OF THE INVENTION 

[0006] The present invention overcomes the aforementioned drawbacks of the prior art by providing at least one 
description scheme. For audio and/or video programs a program description scheme provides information regarding 
the associated program. For the user a user description scheme provides information regarding the user's preferences. 
For the system a system description scheme provides information regarding the system. The description schemes are 
independent of one another. In the preferred embodiment the system may use a combination of the description 
schemes to enhance its ability to search, filter, and browse audiovisual information in a personalized and effective man- 
ner 

[0007] The foregoing and other objectives, features and advantages of the invention will be more readily under- 
stood upon consideration of the following detailed description of the invention, taken in conjunction with the accompa- 
nying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 



[0008] 



FIG 1 is an exemplary embodiment of a program, a system, and a user, with associated description schemes, of 
an audiovisual system of the present invention. 

FIG. 2 is an exemplary embodiment of the audiovisual system, including an analysis module, of FIG.1 . 

FIG. 3 is an exemplary embodiment of the analysis module of FIG. 2. 

FIG. 4 is an illustration of a thumbnail view (category) for the audiovisual system. 

FIG. 5 is an illustration of a thumbnail view (channel) for the audiovisual system. 

FIG. 6 is an illustration of a text view (channel) for the audiovisual system. 

FIG. 7 is an illustration of a frame view for the audiovisual system. 

FIG. 8 is an illustration of a shot view for the audiovisual system. 

FIG. 9 is an illustration of a key frame view the audiovisual system. 

FIG. 10 is an illustration of a highlight view for the audiovisual system. 
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FIG. 11 is an illustration of an event view for the audiovisual system. 

FIG. 12 is an illustration of a character/object view for the audiovisual system. 

FIG. 13 is an alternative embodiment of a program description scheme including a syntactic structure description 
scheme, a semantic structure description scheme, a visualization description scheme, and a meta information 
5 description scheme. 

FIG. 14 is an exemplary embodiment of the visualization description scheme of FIG. 13. 

FIG. 15 is an exemplary embodiment of the meta information description scheme of FIG. 13. 

FIG. 16 is an exemplary embodiment of a segment description scheme for the syntactic structure description 

scheme of FIG. 13. 

10 FIG. 1 7 is an exemplary embodiment of a region description scheme for the syntactic structure description scheme 
of FIG. 13. 

FIG. 18 is an exemplary embodiment of a segment/region relation description scheme for the syntactic structure 
description scheme of FIG. 13. 

FIG. 19 is an exemplary embodiment of an event description scheme for the semantic structure description scheme 
75 of FIG. 13. 

FIG. 20 is an exemplary embodiment of an object description scheme for the semantic structure description 
scheme of FIG. 13. 

FIG. 21 is an exemplary embodiment of an event/object relation graph description scheme for the syntactic struc- 
ture description scheme of FIG. 13. 

20 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

[0009] Many households today have many sources of audio and video information, such as multiple television sets, 
multiple VCR's, a home stereo, a home entertainment center, cable television, satellite television, internet broadcasts, 

25 world wide web, data services, specialized Internet services, portable radio devices, and a stereo in each of their vehi- 
cles. For each of these devices, a different interface is normally used to obtain, select, record, and play the video and/or 
audio content For example, a VCR permits the selection of the recording times but the user has to correlate the televi- 
sion guide with the desired recording times. Another example is the user selecting a preferred set of preselected radio 
stations for his home stereo and also presumably selecting the same set of preselected stations for each of the user's 

30 vehicles. If another household member desires a different set of preselected stereo selections, the programming of 
each audio device would need to be reprogrammed at substantial inconvenience. 

[001 0] The present inventors came to the realization that users of visual information and listeners to audio informa- 
tion, such as for example radio, audio tapes, video tapes, movies, and news, desire to be entertained and informed in 
more than merely one uniform manner. In other words, the audiovisual information presented to a particular user should 
35 be in a format and include content suited to their particular viewing preferences. In addition, the format should be 
dependent on the content of the particular audiovisual information. The amount of information presented to a user or a 
listener should be limited to only the amount of detail desired by the particular user at the particular time. For example 
with the ever increasing demands on the user's time, the user may desire to watch only 10 minutes of or merely the 
highlights of a basketball game. In addition, the present inventors came to the realization that the necessity of program- 
ed ming multiple audio and visual devices with their particular viewing preferences is a burdensome task; especially when 
presented with unfamiliar recording devices when traveling. When traveling, users desire to easily configure unfamjliar 
devices, such as audiovisual devices in a hotel room, with their viewing and listening preferences in a efficient manner. 
[001 1] The present inventors came to the further realization that a convenient technique of merely recording the 
desired audio and video information is not sufficient because the presentation of the information should be in a manner 
45 that is time efficient, especially in light of the limited time frequently available for the presentation of such information. 
In addition, the user should be able to access only that portion of all of the available information that the user is inter- 
ested in, while skipping the remainder of the information. 

[001 2] A user is not capable of watching or otherwise listening to the vast potential amount of information available 
through all, or even a small portion of, the sources of audio and video information. In addition, with the increasing infor- 

so mation potentially available, the user is not likely even aware of the potential content of information that he may be inter- 
ested in. In light of the vast amount of audio, image, and video information; the present inventors came to the realization 
that a system that records and presents to the user audio and video information based upon the user's prior viewing 
and listening habits, preferences, and personal characteristics, generally referred to as user information, is desirable. 
In addition, the system may present such information based on the capabilities of the system devices. This permits the 

55 system to record desirable information and to customize itself automatically to the user and/or listener. It is to be under- 
stood that user, viewer, and/or listener terms may be used interchangeability for any type of content Also, the user infor- 
mation should-be portable between and usable by different devices so that other devices may likewise be configured 
automatically to the particular user's preferences upon receiving the viewing information. 
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[0013] In light of the foregoing realizations and motivations, the present inventors analyzed a typical audio and 
video presentation environment to determine the significant portions of the typical audiovisual environment First; refer- 
ring to FIG. 1 the video, image, and/or audio information 10 is provided or otherwise made available to a user and/or a 
(device) system. Second, the video, image, and/or audio information is presented to the user from the system 12 
(device), such as a television set or a radio. Third, the user interacts both with the system (device) 12 to view the infor- 
mation 1 0 in a desirable manner and has preferences to define which audio, image, and/or video information is obtained 
in accordance with the user information 14. After the proper identification of the different major aspects of an audiovis- 
ual system the present inventors then realized that information is needed to describe the informational content of each 
portion of the audiovisual system 1 6. 

[0014] With three portions of the audiovisual presentation system 1 6 identified, the functionality of each portion is 
identified together with its interrelationship to the other portions. To define the necessary interrelationships, a set of 
description schemes containing data describing each portion is defined. The description schemes include data that is 
auxiliary to the programs 10, the system 12, and the user 14, to store a set of information, ranging from human readable 
text to encoded data, that can be used in enabling browsing, filtering, searching, archiving, and personalization. By pro- 
viding a separate description scheme describing the program(s) 10, the user 14, and the system 12, the three portions 
(program, user, and system) may be combined together to provide an interactivity not previously achievable. In addition, 
different programs 1 0, different users 1 4, and different systems 1 2 may be combined together in any combination, while 
still maintaining full compatibility and functionality. It is to be understood that the description scheme may contain the 
data itself or include links to the data, as desired. 

[0015] A program description scheme 18 related to to video, still image, and/or audio information 10 preferably 
includes two sets of information, namely, program views and program profiles. The program views define logical struc- 
tures of the frames of a video that define how the video frames are potentially to be viewed suitable for efficient brows- 
ing. For example the program views may contain a set of fields that contain data for the identification of key frames, 
segment definitions between shots, highlight definitions, video summary definitions, different lengths of highlights, 
thumbnail set of frames, individual shots or scenes, representative frame of the video, grouping of different events, and 
a close-up view. The program view descriptions may contain thumbnail, slide, key frame, highlights, and close-up views 
so that users can filter and search not only at the program level but also within a particular program. The description 
scheme also enables users to access information in varying detail amounts by supporting, for example, a key frame 
view as a part of a program view providing multiple levels of summary ranging from coarse to fine. The program profiles 
define distinctive characteristics of the content of the program, such as actors, stars, rating, director, release date, time 
stamps, keyword identification, trigger profile, still profile, event profile, character profile, object profile, color profile, tex- 
ture profile, shape profile, motion profile, and categories. The program profiles are especially suitable to facilitate filter- 
ing and searching of the audio and video information. The description scheme enables users to have the provision of 
discovering interesting programs that they may be unaware of by providing a user description scheme. The user 
description scheme provides information to a software agent that in turn performs a search and filtering on behalf of the 
user by possibly using the system description scheme and the program description scheme information. It is to be 
understood that in one of the embodiments of the invention merely the program description scheme is included. 
[001 6] Program views contained in the program description scheme are a feature that supports a functionality such 
as close-up view. In the close-up view, a certain image object, e.g., a famous basketball player such as Michael Jordan, 
can be viewed up close by playing back a close-up sequence that is separate from the original program. An alter? *we 
view can be incorporated in a straightforward manner. Character profile on the other hand may contain spatio-temporal 
position and size of a rectangular region around the character of interest. This region can be enlarged by the presenta- 
tion engine, or the presentation engine may darken outside the region to focus the user's attention to the characters 
spanning a certain number of frames. Information within the program description scheme may contain data about the 
initial size or location of the region, movement of the region from one frame to another, and duration and terms of the 
number of frames featuring the region. The character profile also provides provision for including text annotation and 
audio annotation about the character as well as web page information, and any other suitable information. Such tr- 
ader profiles may include the audio annotation which is separate from and in addition to the associated audio tr r 
the video. 

[0017] The program description scheme may likewise contain similar information regarding audio (such as radio 
broadcasts) and images (such as analog or digital photographs or a frame of a video). 

[0018] The user description scheme 20 preferably includes the user's personal preferences, and information 
regarding the user's viewing history such as for example browsing history, filtering history, searching history, and device 
setting history. The user's personal preferences includes information regarding particular programs and categorizations 
of programs that the user prefers to view. The user description scheme may also include personal information about the 
particular user, such as demographic and geographic information, e.g. zip code and age. The explicit definition of the 
particular programs or attributes related thereto permits the system 16 to select those programs from the information 
contained within the available program description schemes 18 that may be of interest to the user. Frequently, the user 
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does not desire to learn to program the device nor desire to explicitly program the device. In addition, the user descrip- 
tion scheme 20 may not be sufficiently robust to include explicit definitions describing all desirable programs for a par- 
ticular user. In such a case, the capability of the user description scheme 20 to adapt to the viewing habits of the user 
to accommodate different viewing characteristics not explicitly provided for or otherwise difficult to describe is useful. In 

5 such a case, the user description scheme 20 may be augmented or any technique can be used to compare the infor- 
mation contained in the user description scheme 20 to the available information contained in the program description 
scheme 18 to make selections. The user description scheme provides a technique tor holding user preferences ranging 
from program categories to program views, as well as usage history. User description scheme information is persistent 
but can be updated by the user or by an intelligent software agent on behalf of the user at any arbitrary time. It may also 

10 be disabled by the user, at any time, if the user decides to do so. In addition, the user description scheme is modular 
and portable so that users can carry or port it from one device to another, such as with a handheld electronic device or 
smart card or transported over a network connecting multiple devices. When user description scheme is standardized 
among different manufacturers or products, user preferences become portable. For example, a user can personalize 
the television receiver in a hotel room permitting users to access information they prefer at any time and anywhere. In 

15 a sense, the user description scheme is persistent and timeless based. In addition, selected information within the pro- 
gram description scheme may be encrypted since at least part of the information may be deemed to be private (e.g., 
demographics). A user description scheme may be associated with an audiovisual program broadcast and compared 
with a particular user's description scheme of the receiver to readily determine whether or not the program's intended 
audience profile matches that of the user. It is to be understood that in one of the embodiments of the invention merely 

20 the user description scheme is included. 

[0019] The system description scheme 22 preferably manages the individual programs and other data. The man- 
agement may include maintaining lists of programs, categories, channels, users, videos, audio, and images. The man- 
agement may include the capabilities of a device for providing the audio, video, and/or images. Such capabilities may 
include, for example, screen size, stereo, AC3. DTS. color, black/white, etc. The management may also include relation- 

25 ships between any one or more of the user, the audio, and the images in relation to one or more of a program descrip- 
tion scheme(s) and a user description scheme(s). In a similar manner the management may include relationships 
between one or more of the program description scheme(s) and user description scheme(s). It is to be understood that 
in one of the embodiments of the invention merely the system description scheme is included. 
[0020] The descriptors of the program description scheme and the user description scheme should overlap, at least 

30 partially, so that potential desirability of the program can be determined by comparing descriptors representative of the 
same information. For example, the program and user description scheme may include the same set of categories and 
actors. The program description scheme has no knowledge of the user description scheme, and vice versa, so that 
each description scheme is not dependant on the other for its existence. It is not necessary for the description schemes 
to be fully populated. It is also beneficial not to include the program description scheme with the user description 

35 scheme because there will likely be thousands of programs with associated description schemes which if combined 
with the user description scheme would result in a unnecessarily large user description scheme. It is desirable to main- 
tain the user description scheme small so that it is more readily portable. Accordingly, a system including only the pro- 
gram description scheme and the user description scheme would be beneficial. 

[0021] The user description scheme and the system description scheme should include at least partially overiap- 

40 ping fields. With overlapping fields the system can capture the desired information, which would otherwise not be rec- 
ognized as desirable. The system description scheme preferably includes a list of users and available programs. Based 
on the master list of available programs, and associated program description scheme, the system can match the 
desired programs. It is also beneficial not to include the system description scheme with the user description scheme 
because there will likely be thousands of programs stored in the system description schemes which if combined with 

45 the user description scheme would result in a unnecessarily large user description scheme. It is desirable to maintain 
the user description scheme small so that it is more readily portable. For example, the user description scheme may 
include radio station preselected frequencies and/or types of stations, while the system description scheme includes 
the available stations for radio stations in particular cities. When traveling to a different city the user description scheme 
together with the system description scheme will permit reprogramming the radio stations. Accordingly, a system includ- 

so ing only the system description scheme and the user description scheme would be beneficial. 

[0022] The program description scheme and the system description scheme should include at least partially over- 
lapping fields. With the overlapping fields, the system description scheme will be capable of storing the information con- 
tained within the program description scheme, so that the information is properly indexed. With proper indexing, the 
system is capable of matching such information with the user information, if available, for obtaining and recording suit- 

55 able programs. If the program description scheme and the system description scheme were not overlapping then no 
information would be enacted from the programs and stored. System capabilities specified within the system descrip- 
tion scheme of a particular viewing system can be correlated with a program description scheme to determine the views 
that can be supported by the viewing system. For instance, if the viewing device is not capable of playing back video. 
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its system description scheme may describe its viewing capabilities as limited to keyframe view and slide view only. Pro- 
gram description scheme of a particular program and system description scheme of the viewing system are utilized to 
present the appropriate views to the viewing system. Thus, a sewer of programs serves the appropriate views accord- 
ing to a particular viewing system's capabilities, which may be communicated over a network or communication channel 
connecting the sewer with user's viewing device. It is preferred to maintain the program description scheme separate 
from the system description scheme because the content providers repackage the content and description schemes in 
different styles, times, and formats. Preferably the program description scheme is associated with the program, even if 
displayed at a different time. Accordingly, a system including only the system description scheme and the program 
description scheme would be beneficial. 

[0023] By preferably maintaining the independence of each of the three description schemes while having fields 
that correlate the same information, the programs 10, the users 14, and the system 12 may be interchanged with one 
another while maintaining the functionality of the entire system 16. Referring to HG. 2, the audio, visual, or audiovisual 
program 38, is received by the system 16. The program 38 may originate at any suitable source, such as for example 
broadcast television, cable television, satellite television, digital television, internet broadcasts, world wide web, digital 
video discs, still images, video cameras, laser discs, magnetic media, computer hard drive, video tape, audio tape, data 
services, radio broadcasts, and microwave communications. The program description stream may originate from any 
suitable source, such as for example PSIP/DVB-SI information in digital television broadcasts, specialized digital televi- 
sion data services, specialized Internet services, world wide web, data files, data over the telephone, and memory, such 
as computer memory. The program, user, and/or system description scheme may be transported over a network (com- 
munication channel). For example, the system description scheme may be transported to the source to provide the 
source with views or other capabilities that the device is capable of using. In response, the source provides the device 
with image, audio, and/or video content customized or otherwise suitable for the particular device. The system 16 may 
include any device(s) suitable to receive any one or more of such programs 38. An audiovisual program analysis mod- 
ule 42 performs an analysis of the received programs 38 to extract and provide program related information (descrip- 
tors) to the description scheme (DS) generation module 44. The program related information may be extracted from the 
data stream including the program 38 or obtained from any other source, such as for example data transferred over a 
telephone line, data already transferred to the system 16 in the past or data from an associated file. The program 
related information preferably includes data defining both the program views and the program profiles available for the 
particular program 38. The analysis module 42 performs an analysis of the programs 38 using information obtained 
from (i) automatic audio-video analysis methods on the basis of low-level features that are extracted from the pro- 
gram^), (ii) event detection techniques, (iii) data that is available (or extractable) from data sources or electronic pro- 
gram guides (EPGs, DVB-SI, and PSIP), and (iv) user information obtained from the user description scheme 20 to 
provide data defining the program description scheme. 

[0024] The selection of a particular program analysis technique depends on the amount of readily available data 
and the user preferences. For exarrple, if a user prefers to watch a 5 minute video highlight of a particular program, 
such as a basketball game, the analysis module 42 may invoke a knowledge based system 90 (FIG. 3) to determine the 
highlights that form the best 5 minute summary. The knowledge based system 90 may invoke a commercial filter 92 to 
remove commercials and a slow motion detector 54 to assist in creating the video summary. The analysis module 42 
may also invoke other modules to bring information together (e.g., textual information) to author particular program 
views For example, if the program 38 is a home video where there is no further information available then the analysis 
module 42 may create a key-frame summary by identifying key-frames of a multi-level summary and passing to infor- 
mation to be used to generate the program views, and in particular a key frame view, to the description scheme. Refer- 
ring also to FIG. 3, the analysis module 42 may also include other sub-modules, such as for example, a de-mux/decoder 
60, a data and service content analyzer 62, a text processing and text summary generator 64, a close caption analyzer 
66! a title frame generator 68, an analysis manager 70, an audiovisual analysis and feature extractor 72, an event 
detector 74, a key-frame summarizer 76, and a highlight summarizer 78. 

[0025] The generation module 44 receives the system information 46 for the system description scheme. The sys- 
tem information 46 preferably includes data for the system description scheme 22 generated by to generation module 
44. The generation module 44 also receives user information 48 including data for the user description scheme. The 
user information 48 preferably includes data for the user description scheme generated within the generation module 
44. The user irput 48 may include, for example, meta information to be included in the program and system description 
scheme. The user description scheme (or corresponding information) is provided to the analysis module 42 for selective 
analysis of the program(s) 38. For example, the user description scheme may be suitable for triggering the highlight 
generation functionality for a particular program and thus generating the preferred views and storing associated data in 
to program description scheme. The generation module 44 and the analysis module 42 provide data to a data storage 
unit 50 The storage unit 50 may be any storage device, such as memory or magnetic media. 
[0026] A search, filtering, and browsing (SFB) module 52 implements to description scheme technique by parsing 
and extracting information contained within the description scheme. The SFB module 52 may perform filtering, search- 
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ing, and browsing of the programs 38, on to basis of the information contained in the description schemes. An intelligent 
software agent is preferably included within the SFB module 52 that gathers and provides user specific information to 
the generation module 44 to be used in authoring and updating the user description scheme (through the generation 
module 44). In this manner, desirable content may be provided to the user though a display 80. The selections of the 
5 desired program(s) to be retrieved, stored, and/or viewed may be programmed, at least in part, through a graphical user 
interface 82. The graphical user interface may also include or be connected to a presentation engine for presenting to 
information to to user through the graphical user interface. 

[0027] The intelligent management and consumption of audiovisual information using the multi-part description 
steam device provides a next-generation device suitable for the modem era of information overload. The device 
io responds to changing lifestyles of individuals and families, and allows everyone to obtain the information they desire 
anytime and anywhere they want. 

[0028] An example of the use of the device may be as follows. A user comes home from work late Friday evening 
being happy the work week is finally over. The user desires to catch up with the events of the world and then watch 
ABC's 20/20 show later that evening. It is now 9 PM and the 20/20 show will start in an hour at 10 PM. The user is inter- 

75 ested in the sporting events of the week, and all the news about the Microsoft case with the Department of Justice. The 
user description scheme may include a profile indicating a desire that the particular user wants to obtain all available 
information regarding the Microsoft trial and selected sporting events for particular teams. In addition, the system 
description scheme and program description scheme provide information regarding the content of the available infor- 
mation that may selectively be obtained and recorded. The system, in an autonomous manner, periodically obtains and 

20 records the audiovisual information that may be of interest to the user during the past week based on the three descrip- 
tion schemes. The device most likely has recorded more than one hour of audiovisual information so the information 
needs to be condensed in some manner. The user starts interacting with the system with a pointer or voice commands 
to indicate a desire to view recorded sporting programs. On the display, the user is presented with a list of recorded 
sporting events including Basketball and Soccer. Apparently the user's favorite Football team did not play that week 

25 because it was not recorded. The user is interested in basketball games and indicates a desire to view games. A set of 
title frames is presented on the display that captures an important moment of each game. The user selects the Chicago 
Bulls game and indicates a desire to view a 5 minute highlight of the game. The system automatically generates high- 
lights. The highlights may be generated by audio or video analysis, or the program description scheme includes data 
indicating the frames that are presented for a 5 minute highlight. The system may have also recorded web-based textual 

30 information regarding the particular Chicago-Bulls game which may be selected by the user for viewing. If desired, the 
summarized information may be recorded onto a storage device, such as a DVD with a label. The stored information 
may also include an index code so that it can be located at a later time. After viewing the sporting events the user may 
decide to read the news about the Microsoft trial. It is now 9:50 PM and the user is done viewing the news. In fact, the 
user has selected to delete all the recorded news items after viewing them. The user then remembers to do one last 

35 thing before 10 PM in the evening. The next day, the user desires to watch the VHS tape that he received from his 
brother that day, containing footage about his brother's new baby girl and his vacation to Peru last summer. The user 
wants to watch the whole 2-hour tape but he is anxious to see what the baby looks like and also the new stadium built 
in Lima, which was not there last time he visited Peru. The user plans to take a quick look at a visual summary of the 
tape, browse, and perhaps watch a few segments for a couple of minutes, before the user takes his daughter to her 

40 piano lesson at 10 AM the next morning. The user plugs in the tape into his VCR, that is connected to the system, and 
invokes the summarization functionality of the system to scan the tape and prepare a summary. The user can then view 
the summary the next morning to quickly discover the baby's looks, and playback segments between the. key-frames of 
the summary to catch a glimpse of the crying baby. The system may also record the tape content onto the system hard 
drive (or storage device) so the video summary can be viewed quickly. It is now 10:10 PM, and it seems that the user 

45 is 1 0 minutes late for viewing 20/20. Fortunately, the system, based on the three description schemes, has already been 
recording 20/20 since 10 PM. Now the user can start watching the recorded portion of 20/20 as the recording of 20/20 
proceeds. The user will be done viewing 20/20 at 1 1 :1 0PM. 

[0029] The average consumer has an ever increasing number of multimedia devices, such as a home audio sys- 
tem, a car stereo, several home television sets, web browsers, etc. The user currently has to customize each of the 

so devices for optimal viewing and/or listening preferences. By storing the user preferences on a removable storage 
device, such as a smart card, the user may insert the card including the user preferences into such media devices for 
automatic customization. This results in the desired programs being automatically recorded on the VCR, and setting of 
the radio stations for the car stereo and home audio system. In this manner the user only has to specify his preferences 
at most once, on a single device and subsequently, the descriptors are automatically uploaded into devices by the 

55 removable storage device. The user description scheme may also be loaded into other devices using a wired or wire- 
less network connection, e.g. that of a home network. Alternatively, the system can store the user history and create 
entries in the user description scheme based on the*s audio and video viewing habits. In this manner, the user would 
never need to program the viewing information to obtain desired information. In a sense, the user descriptor scheme 
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enables modeling of the user by providing a central storage for the user's listening, viewing, browsing preferences, and 
user's behavior. This enables devices to be quickly personalized, and enables other components, such as intelligent 
agents, to communicate on the basis of a standardized description format, and to make smart inferences regarding the 
user's preferences. 

[0030] Many different realizations and applications can be readily derived from FIGS. 2 and 3 by appropriately 
organizing and utilizing their different parts, or by adding peripherals and extensions as needed. In its most general 
form. FIG. 2 depicts an audiovisual searching, filtering, browsing, and/or recording appliance that is personalizable. The 
list of more specif ic applications/implementations given below is not exhaustive but covers a range. 
[0031] The user description scheme is a major enabler for personalizable audiovisual appliances. If the structure 
(syntax and semantics) of the.description schemes is known amongst multiple appliances, the user (user) can carry (or 
otherwise-transfer) the information contained within his user description scheme from one appliance to another, per- 
haps via a smart card-where these appliances support smart card interface- in order to personalize them. Personali- 
zation can range from device settings, such as display contrast and volume control, to settings of television channels, 
radio stations web stations, web sites, geographic information, and demographic information such as age, zip code etc. 
Appliances that can be personalized may access content from different sources. They may be connected to the web, 
terrestrial or cable broadcast, etc., and they may also access multiple or different types of single media such as video. 

music, etc. . 
[0032] For example, one can personalize the car stereo using a smart card plugged out of the home system and 
plugged into the car stereo system to be able to tune to favorite stations at certain times. As another example, one can 
also personalize television viewing, for example, by plugging the smart card into a remote control that in turn will auton- 
omously command the television receiving system to present the user information about current and fixture programs 
that fits the user's preferences. Different members of the household can instantly personalize the viewing experience 
by inserting their own smart card into the family remote. In the absence of such a remote, this same type of personali- 
zation can be achieved by plugging in the smart card directly to the television system. The remote may likewise control 
audio systems In another implementation, the television receiving system holds user description schemes for multiple 
users (users) in local storage and identify different users (or group of users) by using an appropnate input interface. For 
example an interface using user-voice identification technology. It is noted that in a networked system the user descrip- 
tion scheme may be transported over the network. 

[0033] The user description scheme is generated by direct user input, and by using a software that watches the 
user to determine his/her usage pattern and usage history. User description scheme can be updated in a dynamic fash- 
ion by the user or automatically. A well defined and structured description scheme design allows different devices to 
intemperate with each other. A modular design also provides portability. 

[0034] The description scheme adds new functionality to those of the current VCR. An advanced VCR system can 
learn from the user via direct input of preferences, or by watching the usage pattern and history of the user. The user 
description scheme holds user's preferences users and usage history. An intelligent agent can then consult with the 
user description scheme and obtain information that it needs for acting on behalf of the user. Through the intelligent 
agent the system acts on behalf of the user to discover programs that fit the taste of the user, alert the user about such 
programs, and/or record them autonomously. An agent can also manage the storage in the system according to the 
user description scheme, i.e., prioritizing the deletion of programs (or alerting the user for transfer to a removable 
media), or determining their compression factor (which directly impacts their visual quality) according to user s prefer- 
ences and history. „._.■• -^.^..c^ 
[0035] The program description scheme and the system description scheme work in collaboration with the user 
description scheme in achieving some tasks. In addition, the program description scheme and system descnpton 
scheme in an advanced VCR or other system will enable the user to browse, search, and filter audiovisual programs. 
Browsing in the system offers capabilities that are well beyond fast forwarding and rewinding. For instance, the user can 
view a thumbnail view of different categories of programs stored in the system. The user then may choose frame via* 
shot view key frame view, or highlight view, depending on their availability and user's preference. These views can be 
readily invoked using the relevant information in the program description scheme, especially in program views. The user 
at any time can start viewing the program either in parts, or in its entirety. 

[0036] In this application, the program description scheme may be readily available from many serv.ces such as: (J) 
from broadcast (carried by EPG defined as apart of ATSC-PSIP (ATSC-Program Service Integration Protocol) in USA 
or DVB-SI (Digital Video Broadcast-Service Information) in Europe); (ii) from specialized data services (in addition to 
PSIP/DVB-SI)- (iii) from specialized web sites; (iv) from the media storage unit containing the audiovisual content (e.g.. 
DVD); (v) from advanced cameras (discussed later), and/or may be generated (i.e.. for programs that are being stored) 
by the analysis module 42 or by user input 48. 

[0037] Contents of digital still and video cameras can be stored and managed by a system that implements the 
description schemes, e.g.. a system as shown in FIG. 2. Advanced cameras can store a program description scheme, 
for instance in addition to the audiovisual content itself. The program description scheme can be generated either in 
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part or in its entirety on the camera itself via an appropriate user input interface (e.g.. speech, visual menu drive, etc.). 
Users can input to the camera the program description scheme information, especially those high-level (or semantic) 
information that may otherwise be difficult to automatically extract by the system. Some camera settings and parame- 
ters (e.g., date and time), as well as quantities computed in the camera (e.g., color histogram to be included in the color 

5 profile), can also be used in generating the program description scheme. Once the camera is connected, the system 
can browse the camera content, or transfer the camera content and its description scheme to the local storage for future 
use. It is also possible to update or add information to the description scheme generated in the camera. 
[0038] The IEEE 1 394 and Havi standard specifications enable this type of "audiovisual content" centric communi- 
cation among devices. The description scheme API's can be used in the context of Havi to browse and/or search the 

io contents of a camera or a DVD which also contain a description scheme associated with their content, i.e., doing more 
than merely invoking the PLAY API to play back and linearly view the media. 

[0039] The description schemes may be used in archiving audiovisual programs in a database. The search engine 
uses the information contained in the program description scheme to retrieve programs on the basis of their content The 
program description scheme can also be used in navigating through the contents of the database or the query results. 
is The user description scheme can be used in prioritizing the results of the user query during presentation. It is possible 
of course to make the program description scheme more comprehensive depending on the nature of the particular 
application. 

[0040] The description scheme fulfills the user's desire to have applications that pay attention and are responsive 
to their viewing and usage habits, preferences, and personal demographics. The proposed user description scheme 
20 directly addresses this desire in its selection of fields and interrelationship to other description schemes. Because the 
description schemes are modular in nature, the user can port his user description scheme from one device to another 
in order to "personalize" the device. 

[0041 ] The proposed description schemes can be incorporated into current products similar to those from TiVo and 
Replay TV in order to extend their entertainment informational value. In particular, the description scheme will enable 

25 audiovisual browsing and searching of programs and enable filtering within a particular program by supporting multiple 
program views such as the highlight view. In addition, the description scheme will handle programs coming from 
sources other than television broadcasts for which TiVo and Replay TV are not designed to handle. In addition, by 
standardization of TiVo and Replay TV type of devices, other products may be interconnected to such devices to extend 
their capabilities, such as devices supporting an MPEG 7 description. MPEG-7 is the Moving Pictures Experts Group - 

30 7, acting to standardize descriptions and description schemes for audiovisual information. The device may also be 
extended to be personalized by multiple users, as desired. 

[0042] Because the description scheme is defined, the intelligent software agent can communicate among them- 
selves to make intelligent inferences regarding the user's preferences. In addition, the development and upgrade of 
intelligent software agents for browsing and filtering applications can be simplified based on the standardized user 
35 description scheme. 

[0043] The description scheme is multi-modal in the following sense that it holds both high level (semantic) and low 
level features and/or descriptors. For example, the high and low level descriptors are actor name and motion model 
parameters, respectively. High level descriptors are easily readable by humans while low level descriptors are more 
easily read by machines and less understandable by humans. The program description scheme can be readily harmo- 

40 nized with existing EPG, PSIP, and DVB-SI information facilitating search and filtering of broadcast programs. Existing 
services can be extended in the future by incorporating additional information using the compliant description scheme. 
[0044] For example, one case may include audiovisual programs that are prerecorded on a media such as a digital 
video disc where the digital video disc also contains a description scheme that has the same syntax and semantics of 
the description scheme that the FSB module uses. If the FSB module uses a different description scheme, a transcoder 

45 (converter) of the description scheme may be employed. The user may want to browse and view the content of the dig- 
ital video disc. In this case, the user may not need to invoke the analysis module to author a program description. How- 
ever, the user may want to invoke his or her user description scheme in filtering, searching and browsing the digital 
video disc content. Other sources of program information may likewise be used in the same manner. 
[0045] It is to be understood that any of the techniques described herein with relation to video are equally applicable 

so to images (such as still image or a frame of a video) and audio (such as radio). 

[0046] An example of an audiovisual interface is shown in FIGS. 4-1 2 which is suitable for the preferred audiovisual 
description scheme. Referring to FIG. 4, by selecting the thumbnail function as a function of category provides a display 
with a set of categories on the left hand side. Selecting a particular category, such as news, provides a set of thumbnail 
views of different programs that are currently available for viewing. In addition, the different programs may also include 

55 programs that will be available at a different time for viewing. The thumbnail views are short video segments that pro- 
vide an indication of the content of the respective actual program that it corresponds with. Referring to FIG. 5, a thumb- 
nail view of available programs in terms of channels may be displayed, if desired. Referring to FIG. 6. a text view of 
available programs in terms of channels may be displayed, if desired. Referring to FIG. 7, a frame view of particular pro- 
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grams may be displayed, if desired. A representative frame is displayed in the center of the display with a set of repre- 
sentative frames of different programs in the left hand column. The frequency of the number of frames may be selected, 
as desired. Also a set of frames are displayed on the lower portion of the display representative of different frames dur- 
ing the particular selected program. Referring to FIG. 8, a shot view of particular programs may be displayed, as 
desired- A representative frame of a shot is displayed in the center of the display with a set of representative frames of 
different programs in the left band column. Also a set of shots are displayed on the lower portion of the display repre- 
sentative of different shots (segments of a program, typically sequential in nature) during the particular selected pro- 
gram Referring to FIG. 9, a key frame view of particular programs may be displayed, as desired. A representative frame 
is displayed in the center of the display with a set of representative frames of different programs in the left hand column. 
Also a set of key frame views are displayed on the lower portion of the display representative of different key frame por- 
tions during the particular selected program. The number of key frames in each key frame view can be adjusted by 
selecting the level. Referring to FIG. 1 0, a highlight view may likewise be displayed, as desired. Referring to FIG. 1 1 , an 
event view may likewise be displayed, as desired. Referring to FIG. 12, a character/object view may likewise be dis- 
played, as desired. 

[0047] An example of the description schemes is shown below in XML The description scheme may be imple- 
mented in any language and include any of the included descriptions (or more), as desired. 
[0048] The proposed program description scheme includes three major sections for describing a video program. 
The first section identifies the described program. The second section defines a number of views which may be useful 
in browsing applications. The third section defines a number of profiles which may be useful in filtering and search appli- 
20 cations. Therefore, the overall structure of the proposed description scheme is as follows: 

<?XML version-"!. 0~> 
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<!DCCTYP£ MPEG-7 SYSTEM *mp«g-7 . dtd^ 
<ProgramIdentity> 

<ProgramID> ... </PrograaID> 
<ProgramNaxne> . . . </PrograinHame> 
<5ourceLocation> . . . </SourceLocation> 

</ProgramIdenti.ty> 

<?rogramviews> 

<Thumbnailview> . . . </ThumbnailVIew> 
<SlideView> ... </SIidaView> 
<FrameView> . . . </Frameview> 
<ShotView> ... </ShotView> * 
<Xey Frame vie w> . . . < /Key Frame vie w> 
<HighlightView> ... </HighlightView> 
<EventView> . . . </Eventview> 

<CloseCJpView> </CloseUpView> 

<AlternaceView> ... </AlternateVie*r> 

< / Prog r amv i ewa > 

<Prog ramP ro I i 1 ea> 

<GeneraIFrofile> ... </GenexalProf ile> 
<CategoryPro£ile> . . . < /Category Prof ile> 
<OateTimeProfile> ... </DateTijaeProfile> . 

<KeywordProfiIe> </KeywocdProf ile> 

<TriggerProfile> ... </TriggerPro£ile> 
<StillProfile> ... </StiliProf ile> 
<EveatProf ile> . . . </Ev*ntProfile> 
<CharacterProfil*> . ♦ . </ChAract«rProfile> 
<Obj«ctProfile> ... </Obj«ctProf ile> 
<ColorProfil«> . . * </ColorProf il«> 
<TextureProf ile> . . . </TexrurePro£Il«> 
<ShapeProfil«> - . . </Shap«Prof ile> 
<MotiotiProffila> . . - </MotionProf il«> 

</ProgramProfiles> 
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Program Identity 

[0049] 
s • Program ID 

<ProgramID> program-id </PrograaID> 

10 

The descriptor ( ProgramlD) contains a number or a string to identify a program. 
Program name 

15 <prograaName> program-name </ProgramName> 



The descriptor < ProgramName) specifies the name of a program. 

20 

Source location 

<SourceLocation> soure«-url </Sourc«Location> 

25 The descriptor ( SourceLocation > specifies the location of a program in URL format 

Program Views 
30 [0050] 

Thumbnail view 

<ThurabnailView> 

35 <Xmage> thumbnail- image </Image> 

< /ThuxnbnailView> 



40 



45 



50 



55 



The descriptor (ThumbnailView) specif ies an image as the thumbnail representation of a program. 
Slide view 

<SlidcVicw> frame-id ... </SlideView> 

The descriptor ( SlideView) specifies a number of frames in a program which may be viewed as snapshots or 
in a slide show manner. 

Frame view 

<FrameView> sHart-f rame-id «d-f rawa-id </FrameVi«v> 



The descriptor ( FrameView) specifies the start and end frames of a program. This is the most basic view of a 
program and any program has a frame view. 
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Shot view 

<ShotView> 

<Shoc Ld-~~> scart-frame-id end- frame-id display-frame* id </Sftot> 
<5hot id-~~> start-frame-id end- frame- id display- frame- id </Shot> 

</ShotView> 

The descriptor < ShotView ) specifies a number of shots in a program. The < Shot ) descriptor defines the start 
and end frames of a shot. It may also specify a frame to represent the shot. 

Key-frame view 

<KeyFrameView> 

<KeyFrames level-" - '> 

<Clip id-"> start-frame-id end- frame-id display- frame- id </Clip> 
<Clip id-*~> start-frame-id end-frame-id display- frame-id </Clip> 

< /Key Frames> 

<KeyFrames level»**> 

<Clip id-**> start-frame-id end- frame- id display-frame-id </Clip> 
<Clip id-*"> start-frame-id end-frame-id display- frame-id </Clip> 

< /Kay Frames > 

</KeyFrameView> 



The descriptor (KeyFrameView) specifies key frames in a program. The key frames may be organized in a 
hierarchical manner and the hierarchy is captured by the descriptor (KeyFrames) with a level attribute. The clips 
45 which are associated with each key frame are defined by the descriptor ( Clip) . Here the display frame in each clip 
is the corresponding key frame. 
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25 



Highlight view 

<HighlightVi«v> 

<Highlight length— > 

<Clip id— > start-frame-id end-frame-id display- frame-id </Clip> 
<Clip id— > start-frame-id end-f rame-id display- frame- id </Cli P > 



</Highlight> 

<Highlight: length— > 

<Clip id— > start-frame-id end-frame-id display-frame-id </Clip> 
<Clip id— > start-frame-id end-frame-id display- frame- id </Cli P > 



20 </Highiight> 
</HighlightView> 



35 



40 



45 



50 



m "SS*" isspedi^'by "^edescriptor ( Highlight) with a length attribute 
Event view 



<EventView> 

<Events mm^*> 



<Clip 14— > start-fram-id end-f ram-id display-fram-id </CUp> 
<Clip id-"> start-fra*e-id «nd-fra»e-id display* ram-id </Clip> 

</Events> 
<Events name-*^ 

<dip id— > start-fra—id and-frame-id di 3 play-f rame-id </Clip> 
<Cli P id— > start-rrame-id «d-fra»e-id display-frame-id </Clip> 

</Events> 



55 </EventView> 
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The descriptor (EventView) specifies clips which are related to certain events in a program. The clips are 
grouped into the corresponding events which are specified by the descriptor < Event) with a name attribute. 

Close-up view 



<CloseCpViev> 

<Target name-*"*> 

<Clip id-*~> a tart- frame- id end- frame- id display- frame- id </Clip> 
<Clip ie>""> s cart-frame-id end- frame- id display- frame- id </Clip> 

</Target> 



<Target name«* - '> 

<Clip id-**> a tart:- frame-id end-frame-id display- frame- id </Clip> 
<Clip id-*"> start- frame- id end- frame- id display- frame-id </Clip> 

</Target> 

</CloseOpView> 



The descriptor (CloseUpView) specifies clips which may be zoomed in to certain targets in a program. The 
clips are grouped into the corresponding targets which are specified by the descriptor < Target) with a name 
attribute. 

40 • Alternate view 

<AlternateView> 

<AlternateSource id-"*> source-url </AlternateSource> 
45 <AlternateSource id-»"> source-url </AlternateSource> 



</AlternateView> 



The descriptor < Alternate View) specifies sources which may be shown as alternate views of a program. Each 
alternate view is specified by the descriptor < AlternateSource) with an id attribute. The locate of the source may 
be specified in URL format. 
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Program Profiles 

[0051] 
5 • General profile 

10 
15 
20 
25 



<G^neralProfile> 

<Title> title-text </Tttle> 
<Ab3tract> abstract-text </Ab5tract> 
<Audio> voice-annotation </Audio> 
<Www> web-page-url </Www> 
<CloaedCaption> yes /no </CloaedCaption> 
<Language> language-name </Language> 
<Rating> rating </Rating> 
<Length> time </Length> 
<Authors> author-name </Autho«> 
<Producers> producer-name ♦ </Producers> 
<Directors> director-name . . . </Direc*ors> 
<Actora> actor-name ... </Actors> 



30 



</GeneralProfiile> 
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The descriptor ( GeneraiProf ile ) describes the general aspects of a program. 
Category profile 

<CategoryProfile> category-name ... < /Category Profile 

The descriptor ( Categor /Profile ) specifies the categories under which a program may be classified. 
Date-time profile 

<DateTimePro£ile> 

<Production0ate> date </ProductionDate> 
<ReleaseDate> date </ReieaseDate> 
<RecordingOate> date </RecordingDate> 
<RecordingTime> time </RecordingTiae> 

</DateTimePro«ile> 
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The descriptor < DateTimeProf ile ) specifies various date and time information of a program. 
Keyword profile 

5 

<KeywordProf iie> keyword . . . </KeywordProf ile> 

w The descriptor < KeywordProf ile > specifies a number of keywords which may be used to filter or search a pro- 

gram. 

Trigger prof De 

is <TriggerProfile> trigger- frame-id ... </TriggerProf ile> 

The descriptor < Trigger Profile > specifies a number of frames in a program which may be used to trigger certain 
so actions while the playback of the program. 

25 
30 
35 
40 
45 
50 
55 
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• Still profile 

<StillProfiie> 

<Still id-"~> 

<HotRegion id 

<Location> xl yl x2 y2 </Location> 
<Text> text-annotation </Text> 
<Audio> voice-annotation </Audio> 
<Www> web-page-url </www> 

</HotRegion> 

<HotRegion id -*"■> 

<Location> xl yl x2 y2 </Location> 
<Text> text-annotation </Text> 
<Audi6> voice-annotation </Audio> 
<Www> web-page-url </Www> 

</HotRegion> 

</Still> 
<Still id-""> 

<HotRegion id -~~> 

<Location> xl yl x2 y2 </Location> 
<Text> text-annotation </Text> 
<Audio> voice-annotation </Audio> 
<Ww> web-page-url </Www> 
</HotRegion> 
<HotRegion id 

<Location> xl yl x2 yZ </Location> 
<Text> text-annotation </Text> 
<Audio> voice-annotation </Audio> 
<Ww> web-page-url </Vww> 
? </HotRegion> 



55 
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</Still> 



</StillProfile> 



10 



The descriptor ( StillProfile) specifies hot regions or regions of interest within a frame. The frame is specified 
by the descriptor < Still) with an id attribute which corresponds to the frame-id. Within a frame, each hot region is 
specified by the descriptor < HotRegion) with an id attribute. 



Event profile 

15 



<Even t Pro f ile> 

<EventLiat> event-name ... </EventLiat> 



20 

<Event name-*"*> 



25 



30 



35 



40 



45 



<Www> veb-page-url </Www> 
<Occurrence id—" f-r > 

<Ducation> atart-frame-id end-frame- id </Duration> 

<Text> text-annotation </Text> 

<Audio> voice-annotation </Audio> 
</Occurrcnce> 
<Occurrence id-~"> 

<Ouration> start- frame- id end- frame- id </Duration> 

<Text> text-annotation </Text> 

<Audio> voice-annotation </Audio> 
< /Occurr ence> 
• * • 
</Event> 
<Event name-*"^ 



so 



55 



BNSDOCID: <EP 1 026887 A2_L> 



19 



EP 1 026 887 A2 



<Www> v*b-page-url </Www> 
<Occurrence i<*^~~> 

5 <Duration> atart-frame-id «nd-fran»-id </Durat:ion> 

<Text> text-annotation </Text> 
<Audio> voice-annotation < /Audio 
10 </Occurrence> 

<Occurrence id-* r#f > 

<Duration> start-f rame-id end- frame- id </Duration> 
15 <Text> text-annotation </Text> 

<Audio> voice-annotation </Audio> 
</Occurrence> 

20 

</Event> 

25 ' m 

</EventProfile> 



30 The descriptor < EventProfile) specifies the detailed information for certain events in a program. Each event is 

specified by the descriptor < Event) with a name attribute. Each occurrence of an event is specified by the descrip- 
tor < Occurrence) with an id attribute which may be matched with a clip id under { EventView) . 



35 



40 



45 



Character profile 



<OharacterProfile> 

<CharacterLi3t> character-name ... </Charact«rLiat> 

<Character name-"^ 

<ActorNaae> actor-name </ActorName> 
<G*nd*r> male </Gender> 
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<Age> age </Age> 

<ffww> web-page-url </W*n#> 

<Occurrence id«* <p > 

<Duratioo> a Cart- frame- id end- frame- id </Duration> 

<Location> frame: [xi yl x2 y21 ... </Location> 

<Motion> v, v y </Motion> 

<Tcxc> text-annotation </Text> 

<Audio> voice-annotation </Audio> 
</Oceurrence> 
<Occarrenca id«~~> 

<0uration> start-frame-id end- frame- id </0uration> 

<Location> frame: [xl yl x2 y2J </Locacion> 

-O«btion> v, v y v, v. v T </Motion> 

<Text> text-annotation </Text> 

<Audio> voice -annotation </Audio> 
</Occurrence> 

</Character> 
<Character name-**> 

<ActorWame> actor-name </ActorName> 

<Gender> male </Ge»der> 

<Age> age </Age> 

<Www> veb-page-url </Www> 

<Occurrence id-"~> 

<Duration> 3 tart-frame- id end- frame- id </Duratioo> 

<cLocatien> frame : [xl yl x2 y2l ... </Location> 

<Motioo> v, v t v. v, v v </Hotion> 

<Text> text-annotation </Text> 

<Audio> voice-annotation </Audio> 
</Occurrence> 
<Occurrence id-"~> 

<Oaration> atart-frame-id end-frame-id </Ouratioo> 
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<Locatlon> frame: (xl yl *2 y2] </Location> 
<Mocion> v« v y v s v. v $ v v </Motion> 
<Text> text-annotation </Text> 
<Audio> voice-annotation </Andio> 
< /Occurrence 

</Character> 



75 </CharacterProfile> 



20 



The descriptor ( CharacterProfile) specifies the detailed information for certain characters in a program. Each 
character is specified by the descriptor < Character) with a name attribute. Each occurrence of a character is spec- 
ified by the descriptor (Occurrence) with an id attribute which may be matched with a clip id under 
(CloseUpViewh 



25 • Object profile 



<Obj*ctProfile> 

<ObjeotLi3t> object-name ... </ObjectLiat> 
<Object name— "> 

<Www> veb-page-uxl </Www> 
Occurrence id-"*> 

<Ouration> staxt-frame-id end-frame-id </Duration> 
<Location> frame: Cxi yl x2 y2J ... </Xocation> 
<Motion> v, v. v y </Motion> 

<Text> text-annotation </Text> 
<Audio> voice-annotation </Audio> 
</Occurrence> 



50 
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<0ccnrrenca id-*"*> 

<Duration> start- frame- id end- frame- id </Duration> 
5 <Location> frame: I xl yl x2 y2J • </Location> 

<Moticn> v, v y v, v. v v </Motion> 
<Text> text-annotation </Text> 
10 <Audio> voice- annotation </Audio> 

</0ccurrence> 

15 

</Objeet> 
<0bject name-"> 

<Uww> web-page-url </Www> 

20 

Occurrence id-*"*> 

<Duration> a tart- frame-id end- frame- id </Duration> 
<Location> frame: [xl yl x2 y2] — </Location> 
<Wotion> v, v y v x v. v, v v </Motion> 
<Text> text-annotation </Text> 
<Audio> voice -annotation < /Audio > 

30 

< /Occur rence> 
<Occurrence id-""*> 

<Duration> 3 tart- frame-id end- frame- id </Duration> 

35 

<I^>cation> frame: [xl yl x2 y21 ... </Location> 
<Motion> v, v y v a v. v t v v </Motion> 
<Text> text-annotation </Text> 

40 

<Audio> voice-annotation </Audio> 
</Occurrence> 

45 

</Object> 



</ObjectProfile> 

50 



The descriptor < ObjectProf ile ) specifies the detailed information for certain objects in a program. Each object 
is specified by the descriptor (Object) with a name attribute. Each occurrence of a object is specified by the 
descriptor (Occurrence) with an id attribute which may be matched with a clip id under < CloseUpView) . 
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Color profile 

< j.orPro£ile> 
</ColorProfile> 

The descriptor ( ColorProfile) specifies the detailed color information of a program. All MPEG-7 color descrip- 
tors may be placed under here. 

Texture profile 

<TextureProfile> 
</TextureProfile> 



20 



The descriptor <TextureProfile> specifies the detailed texture information of a program. All MPEG-7 texture 
descriptors may be placed under here. 

25 • Shape profile 

<Sh*p«Profile> 



30 



40 



45 



50 
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</Shap€Profile> 



The descriptor <ShapeProfile) specifies the detailed shape information of a program. All MPEG-7 shape 
35 descriptors may be placed under here. 

• Motion prof ile 

<MotionProfile> 



</MotionPro£ile> 



The descriptor < MotionProf ile ) specifies the detailed motion information of a program. All MPEG-7 motion 
descriptors may be placed under here. 

User Description Scheme 

[0052] The proposed user description scheme includes three major sections for describing a user. The first section 
identifies the described user. The second section records a number of settings which may be preferred by the user. The 
third section records some statistics which may reflect certain usage patterns of the user. Therefore, the overall struc- 
ture of the proposed description scheme is as follows: 
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<?XML veraion-~l*<r> 

< ! DOCTYPE MPEG-7 SYSTEM *mpeg-7.dtd*> 
<Oae r Ident i t y > 

<UaerID> . . . </DserlD> 

<OserNajne> </03«rHaae> 

</OserIdentity> 
<UaerPref erences> 

<8rowsingPre£erencea> ... < /Brows ingPref «renc«s> 

<FiltecingPreferences> ... </FilteriagPreference5> 

<SearchPreferences> ... </Se*rchPreferencea> 

<DevicePre£erences> </OevicePreferencea> 

</UserPreferencea> 
<03erHistory> 

<BrowsingHistory> ... </Brow3ingHi3tory> 

<FilteringHiatory> ... </FilteringHi3tory> 

<SearchHistory> . . . </SearchHiatory> 

<0eviceHistory> ... </D«viceHistory> 
</OserHist:ory> 
•cUserDemog raphics> 

<Age> . . . </Age> 

<Gendec> . . . </Gender> 

<2IP> . . . </2IP> 
</OserOeisographics> 



User Identity 

45 

[0053] 

User ID 

so <OserIO> ua«r-id </0aerID> 



The descriptor < UserlD ) contains a number or a string to identify a user. 

55 
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User name 

<U3erName> user-name </03erName> 

5 

The descriptor < UserName) specifies the name of a user. 
User Preferences 

10 

[0054] 

Browsing preferences 

15 

<Brow3ingPreferences> 
<Vievs> 

<ViewCategory id-~~> view- id — </ViewCategory> 
20 <ViewCategory id-"> view-id . . . <7VievCategory> 

</Viewa> 

25 <FrameFrequ«ncy> frequency • - -<FrameFrequency> 

<ShotFrequency> frequency . . . <Sho t Frequency^ 
<KeyFrame£«vel> level-Id . . -<KeyFrameLevel> 

30 <HighlightLength> lengtn . . -<Higiaightiengta> 



</Brow3ingPreference3> 

35 

The descriptor (BrowsingPreferences) specifies the browsing preferences of a user. The user's preferred 
views are specified by the descriptor < Views ) . For each category, the preferred views are specified by the descrip- 

40 tor (ViewCategory) with an id attribute which corresponds to the category id. The descriptor 
< FrameFrequency ) specifies at what interval the frames should be displayed on a browsing slider under the frame 
view. The descriptor < ShotFrequency > specifies at what interval the shots should be displayed on a browsing slider 
under the shot view. The descriptor < KeyFrameLevel ) specifies at what level the key frames should be displayed 
on a browsing slider under the key frame view. The descriptor < HighlightLength) specifies which version of the 

45 highlight should be shown under the highlight view. 
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Filtering preferences 

<FilteringPreferencea> 

<Categorles> category-name * * , </Categories> 
<Channels> channel -number . * . </Channela> 
<Ratings> rating-id ... </Ratings> 

<Shows> show-name </Shows> 

<Authors> author-name . . . </Authora> 
<Producers> producer-name . . . </Producers> 
<0irectors> director-name . . . </Directors> 
<Actors> actor-name . . . </Actors> 
<Keyworda> keyword . - . </Keyvords> 
<Titles> title-text ... </Titles> 

</?ilteringPre£erences> 



The descriptor < FilteringPreferences) specifies the filtering related preferences of a user. 
Search preferences 

<SearchPreferences> 

<Categories> category-name . . . </Categories> 

<Channels> channel-number </Channels> 

, <Ratings> rating-id </Ratings> 

<Shows> show-name . . . </ Shows > 
<Authors> author-name . . , </Authora> 
<Producers> producer-name . . . </Producers> 
<Directors> director-name - . . </Oirectors> 
<Actors> actor-name . . . </Actors> 

<Keywords> keyword - </Keywords> 

<Ticles> title-text </Titles> 

</SearchPreferences> 

The descriptor < SearchPreferences ) specifies the search related preferences of a user. 
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Device preferences 

<DevicePreference3> 

<Brightne«> brightness-value </Brightneaa> 
<Contrast> contrast-value </Contrast> 
<Volume> volume-value </Volume> 

</DevicePreferenees> 



The descriptor ( DevicePreferences) specifies the device preferences of a user. 
15 Usage History 
[0055] 

Browsing history 



20 



25 



30 



35 



40 



45 
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<B rows ingHi s to r y > 
<Views> 

<ViewCategory id- — > view- id ... </ViewCategory> 
<ViewCategory id-~ - > view-id ... </viewCategory> 

</Vievs> - 

< Frame Frequency> frequency <FrameFreguency> 

<ShotFrequency> frequency <ShotFrequency> 

<KeyFrameLevel> level-id . - - <KeyFrameLevel> 
<HighlightLength> length - . - <HighlightLength> 

</Brow5ingHistory> 

The descriptor ( BrowsingHistory) captures the history of a user's browsing related activities. 
Filtering history 

<Fil t e ringHis to ry> 

<Categories> category-name ... </Categoriea> 
<Channels> channel-number - . . </Cnannel3> 
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<Ratings> rating-id * * - </Rating3> 
<Shows> show-name ... </ Show3> 
<Authora> author-name - . . </Authoc3> 
<Producer3> producer-name ... </Producers> 
<Directors> director-name — </Direccora> 
<Actor3> actor-name . . . </Actors> 
<Keyword3> keyword . . . </Keywords> 
<Titles> title-text </Title3> 

</FilteringHi3tory> 



The descriptor < FilteringHistory > captures the history of a user's f iltering related activities. 
Search history 

25 <SearchHi3tory> 

<Categorie3> category-name ... </Categories> 
<Channel3> channel -number . . . </Channels> 

30 <Ratings> rating- id - - . </Ratings> 

<Show3> show-name . . . </Show3> 
<Authors> author-name . . - </Authora> 

35 <Froducers> producer-name — </Producer3> 

<Directora> director-name . . . </Director3> 
<Actor3> actor-name . . . </Actors> 

40 <Keyword3> keyword ... </Keyword3> 

<Titles> title-text </Titles> 

45 </Searchfli3tory> 



The descriptor (Search History) captures the history of a user's search related activities. 
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Device history 



10 



OeviceHia t ory> 

<Brightneas> brighcness-value . . . </Brightneaa> 
<Contra3t> con trasc -value — </Contra^t> 
<Volume> volume-value . . - </Voluxne> 

</DeviceHistory> 



The descriptor < DeviceHistory) captures the history of a user's device related activities. 

75 

User demographics 
[0056] 
20 • Age 

<Age> age </Age> 

25 

The descriptor (Age) specifies the age of a user. 
Gender 

30 <Gender> </Gender> 



35 



40 



45 



so 



The descriptor ( Gender) specifies the gender of a user. 
ZIP code 

<ZIP> ... </ZIP> 

The descriptor ( ZIP ) specifies the ZIP code of where a user lives. 
System Description Scheme 

[0057] The proposed system description scheme includes four major sections for describing a user. The first sec- 
tion identifies the described system. The second section keeps a list of all known users. The third section keeps lists of 
available programs. The fourth section describes the capabilities of the system. Therefore, the overall structure of the 
proposed description scheme is as follows: 



55 



BNSDOCID: <EP 1 026887 A2 J _> 



30 



EP 1 026 887 A2 



10 



15 



20 



25 



30 



35 



<?XML ver3ion-~1.0*> 

<!DOCTY*E MPEG-7 SYSTEM ~mpeg-7 . dtd - > 
<SystemIdentity> 

<SystemID> . . . </SysteuID> 

<Syste»Name> * . . </SystenName> 

<SystemSerialNumber> . . - </SystemS«rialNumb«r> 
</Sy9temIdeatlty> 
<SystemUsers> 

<Users> - . • </Osers> 
</System03ers> 
<Sys t eaPr ogram3> 

<Categories> . , . </Categories> 

<Channels> </ChMnels> 

<Programs> . . . </Pro9raas> 



</ Sy s tamPr ograns> 
<SystexnCapabilities> 

<Views> ... </Viewa> 
</SystemCapabilitias> 



40 



45 



System Identity 
[0058] 

System ID 



<SystemID> system- id </SystemID> 



50 



The descriptor < SystemID > contains a number or a string to identify a video system or device. 



System name 



<SystemNam«> system- name </SysteioNante> 



55 



The descriptor ( SystemName) specifies the name of a video system or device. 



31 

BNSDOCID: <EP 1026887A2_I_> 



EP 1 026 887 A2 



10 



15 



20 



25 



30 



System serial number 

<Syst«mSerialNumber> 3y3t«ft-s*rial-mmb€r </Sy3temSerialNuxnber> 

The descriptor ( SystemSerialNumber) specifies the serial number of a video system or device. 
System Users 
[0059] 
Users 

<Osers> 

<Uaer> 

<UserID> uaer-id </UserI0> 
<0serNajne> oser-name </OserNaae> 
<AJser> 
<Oaer> 

<UserI0> user-id </CaerID> 
<UserName> user-name </UserName> 
</Dser> 

</Vsers> 



35 



40 



The descriptor { SystemUsers ) lists a number of users who have registered on a video system or device. Each 
user is specified by the descriptor (User). The descriptor (UserlD) specifies a number or a string wh,ch should 
match with the number or string specified in ( UserlD > in one of the user description schemes. 



Programs in the System 
[0060] 
45 • Categories 

so 



<Cate?ories> 
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<Category> 

<CategoryID> category-id </CategoryID> 
<CategoryNaste> category-name </CategoryName> 
<SubCategoriea> aub-category-id ...</ Subcategories > 

</Category> 

<Category> 

<CategoryID> category-id </CategoryID> 
<CategoryName> category-name </CategoryName> 

<SubCategories> sub- category-id </SubCategories> 

</Category> 

</Categories> 

The descriptor < Categories > lists a number of categories which have been registered on a video system or 
device. Each category is specified by the descriptor (Category). The major-sub relationship between categories 
is captured by the descriptor ( Subcategories > . 

Channels 

<Channel*> 

<Channel> 

<ChannelID> channel-id </ChannellD> 
<ChannelName> channel-name </ChannelHame> 
<SubChannels> sub- channel-id . . . </SubChannel3> 

</Channel> 

<Channel> 

<ChannelIO> channel-id </ChannelID> 
<ChannelMame> channel-name </ChannelKame> 
<SubQiannel3> a ub- channel-id . . . </SubChannels> 

</Channel> 
</Ghannela> 



1026887A2_I_> 
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The descriptor (Channels) lists a number of channels which have been registered on a video system or 
device. Each channel is specified by the descriptor (Channel). The major-sub relationship between channels is 
captured by the descriptor < SubChannels) . 



Programs 



<Program3> 

<CategoryPrograms> 

<CategoryID> category-id </CategoryID> 
<Programs> prog ram- id — </Programs> 

</CategoryPrograms> 

<CategoryPrograms> 

<CategoryID> category-id </CategoryID> 
<Prograats> program-id - - . </Programs> 

</CategoryPrograma> 



<ChannelProgram3> 

<ChannelID> channel-id </ChannelID> 
<Programa> program-id ... </Programs> 

</ChanaelProgram3> 

<ChannelPrograaj> 

<OiannelID> channel- id </Chann«lID> 
<Programa> program-id . . . </Programs> 

</Chana*lPrograms> 



</Prograas> 



The descriotor ( Programs) lists programs who are available on a video system or device. The programs are 

arouTeTu^ categories " cha ™ e,s - Each group ° f pr ° gramS m s P ecified D by the d *^ P ?h 

< ChannePrograms) . Each program id contained in the descriptor Programs) should 

match with thVnumber or string specified in < ProgramID ) in one of the program descnption schemes. 
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System Capabilities 

[0061] 

• Views 



<Viewa> 

<View> 

<ViewID> view-id </VicwID> 
<ViewNaae> view- name </ViewName> 

</View> 

<View> 

<cviewID> view-id </ViewID> 
<viewNaj&e> view-name </ViewName> 
</View> 

</Viewa> 



The descriptor ( Views ) lists views which are supported by a video system or device. Each view is specified by 
30 the descriptor < View > . The descriptor < ViewName ) contains a string which should match with one of the following 
views used in the program description schemes: ThumbnailView, SlideView, FrameView, ShotView, Key- 
FrameView, HighlightView, EventView, and CloseUp View. 

[0062] The present inventors came to the realization that the program description scheme may be further modified 

35 to provide additional capabilities. Referring to FIG. 1 3. the modified program description scheme 400 includes four sep- 
arate types of information, namely, a syntactic structure description scheme 402, a semantic structure description 
scheme 404, a visualization description scheme 406, and a meta information description scheme 408. It is to be under- 
stood that in any particular system one or more of the description schemes may be included, as desired. 
[0063] Referring to FIG. 14, the visualization description scheme 406 enables fast and effective browsing of video 

40 program (and audio programs) by allowing access to the necessary data, preferably in a one-step process. The visual- 
ization description scheme 406 provides for several different presentations of the video content (or audio), such as for 
example, a thumbnail view description scheme 410. a key frame view description scheme 412, a highlight view descrip- 
tion scheme 414, an event view description scheme 416, a close-up view description scheme 418, and an alternative 
view description scheme 420. Other presentation techniques and description schemes may be added, as desired. The 

45 thumbnail view description scheme 410 preferably includes an image 422 or reference to an image representative of 
the video content and a time reference 424 to the video. The key frame view description scheme 412 preferably includes 
a level indicator 426 and a time reference 428. The level indicator 426 accommodates the presentation of a different 
number of key frames for the same video portion depending on the user's preference. The highlight view description 
scheme 414 includes a length indicator 430 and a time reference 432. The length indicator 430 accommodates the 

so presentation of a different highlight duration of a video depending on the user's preference. The event view description 
scheme 416 preferably includes an event indicator 434 for the selection of the desired event and a time reference 436. 
The close-up view description scheme 418 preferably includes a target indicator 438 and a time reference 440. The 
alternate view description scheme preferably includes a source indicator 442. To increase performance of the system it 
is preferred to specify the data which is needed to render such views in a centralized and straightforward manner. By 

55 doing so, it is then feasible to access the data in a simple one-step process without complex parsing of the video. 

[0064] Referring to FIG. 15, the meta information description scheme 408 generally includes various descriptors 
which carry general information about a video (or audio) program such as the title, category, keywords, etc. Additional 
descriptors, such as those previously described, may be included, as desired. 
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[0065] Referring again to FIG. 13, the syntactic structure description scheme 402 specifies the physical structure 
of a video program (or audio), e.g., a table of contents. The physical features, may include for example, color, texture, 
motion etc. The syntactic structure description scheme 402 preferably includes three modules, namely a segment 
description scheme 450, a region description scheme 452, and a segment/region relation graph description scheme 
454 The segment description scheme 450 may be used to define relationships between different portions of the video 
consisting of multiple frames of the video. A segment description scheme 450 may contain another segment description 
scheme 450 and/or shot description scheme to form a segment tree. Such a segment tree may be used to define a tem- 
poral structure of a video program. Multiple segment trees may be created and thereby create multiple table of contents. 
For example a video program may be segmented into story units, scenes, and shots, from which the segment descrip- 
tion scheme 450 may contain such information as a table of contents. The shot description scheme may contain a 
number of key frame description schemes, a mosaic description scheme(s), a camera motion description scheme(s), 
etc The key frame description scheme may contain a still image description scheme which may in turn contains color 
and texture descriptors. It is noted that various low level descriptors may be included in the still image description 
scheme under the segment description scheme. Also, the visual descriptors may be included in the region description 
scheme which is not necessarily under a still image description scheme. On example of a segment description scheme 
450 is shown in FIG. 16. 

[0066] Referring to FIG. 17. the region description scheme 452 defines the interrelationships between groups of 
pixels of the same and/or different frames of the video. The region description scheme 452 may also contain geometri- 
cal features, color, texture features, motion features, etc. 

[0067] Referring to FIG. 18. the segment/region relation graph description scheme 454 defines the interrelation- 
ships between a plurality of regions (or region description schemes), a plurality of segments (or segment description 
schemes), and/or a plurality of regions (or description schemes) and segments (or description schemes). 
[0068] Referring again to FIG. 13, the semantic structure description scheme 404 is used to specify semantic fea- 
tures of a video program (or audio), e.g. semantic events. In a similar manner to the syntactic structure description 
scheme the semantic structure description scheme 404 preferably includes three modules, namely an event descrip- 
tion scheme 480. an object description scheme 482. and an event/objection relation graph description scheme 484. 
The event description scheme 480 may be used to form relationships between different events of the video normally 
consisting of multiple frames of the video. An event description scheme 480 may contain another event description 
scheme 480 to form a segment tree. Such an event segment tree may be used to define a semantic index table for a 
video program. Multiple event trees may be created and thereby creating multiple index tables. For example, a video 
program may include multiple events, such as a basketball dunk, a fast break, and a free throw, and the event descrip- 
tion scheme may contain such information as an index table. The event description scheme may also contain refer- 
ences which link the event to the corresponding segments and/or regions specified in the syntactic structure description 
scheme. On example of an event description scheme is shown in FIG. 1 9. 

[0069] Referring to FIG. 20. the object description scheme 482 defines the interrelationships between groups of 
pixels of the same and/or different frames of the video representative of objects. The object description scheme 482 
may contain another object description scheme and thereby form an object tree. Such an object tree may be used to 
define an object index table for a video program. The object description scheme may also contain references which link 
the object to the corresponding segments and/or regions specified in the syntactic structure description scheme. 
[0070] Referring to FIG. 21 . the event/object relation graph description scheme 484 defines the interrelationships 
between a plurality of events (or event description schemes), a plurality of objects (or object description schemes), 
and/or a plurality of events (or description schemes) and objects (or description schemes). 

[0071] The terms and expressions that have been employed in the foregoing specification are sued as terms of 
description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equiv- 
alents of the features shown and described or portions thereof, it being recognized that the scope of the invention is 
defined and limited only by the claims that follow. 

Claims 

1 . A method of using a system (12) with at least one of audio, an image, and a video comprising a plurality of frames 
comprising the steps of: 

(a) providing at least one of the following: 

(i) a program description scheme (18) containing information (10) related to at least one of information 
regarding interrelationships between a plurality of said frames, characteristics of the content of a plurality 
of said frames, characteristics of the content of said audio, characteristics of the content of said image, 
characteristics of the content of said video; 
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(ii) a user description scheme (20) containing information (14) related to at least one of a user's personal 
preferences, information related to said user, a user's viewing history, and a user's listening history; 

(iii) a system description scheme (22) containing information (10) regarding at least one of available vid- 
eos, available categories, available channels, available users, available images, capabilities of a device for 

5 providing said at least one of said audio, said image, and said video to a user, relationship between at least 

two of said video, said program description scheme (18), and said user description scheme (20), relation- 
ship between at least two of said audio, said program description scheme (18), and said user description 
scheme (20), relationship between at least two of said image, said program description scheme (18), and 
said user description scheme (20); and 

10 

(b) selecting at least one of a video, an image, and audio based upon said at least one of said program descrip- 
tion scheme (18), said user description scheme (20), and said system description scheme (22). 

2. A method of using a system (1 2) with at least one of audio, image, and a video comprising a plurality of frames com- 
15 prising the steps of: 

(a) providing at least two of the following: 

(i) a program description scheme (18) containing information (10) related to at least one of information 
20 regarding interrelationships between a plurality of said frames, characteristics of the content of a plurality 

of said frames, characteristics of the content of said audio, characteristics of the content of said image, 
characteristics of the content of said video; 

(ii) a user description scheme (20) containing information (14) related to at least one of a user's personal 
preferences, information related to said user, a user's viewing history, and a user's listening history; 

25 (iii) a system description scheme (22) containing information (10) regarding at least one of available vid- 

eos, available categories, available channels, available users, available images, capabilities of a device for 
providing said at least one of said audio, said image, and said video to a user, relationship between at least 
two of said video, said program description scheme (18), and said user description scheme (20), relation- 
ship between at least two of said audio, said program description scheme (18), and said user description 

30 scheme (20), relationship between at least two of said image, said program description scheme (18), and 

said user description scheme (20); and 

(b) selecting at least one of a video, an image, and audio based upon said at least two of said program descrip- 
tion scheme (18), said user description scheme (20), and said system description scheme (22). 

35 

3. The method of claim 1 or 2 wherein said program description scheme (18) includes at least one of a title, a cate- 
gory, an annotation, a keyword, and a date related to said plurality of said frames, 

4. The method of claim 1 or 2 wherein said program description scheme (18) includes fields for storing (1 ) information 
40 regarding interrelationships between said plurality of said frames includes the identification of key frames of said 

video, (2) information regarding interrelationships between said plurality of said frames includes the identification 
of a plurality of said frames representative of the highlights of at least a portion of said video, (3) information regard- 
ing interrelationships between said plurality of said frames includes the identification of a set of frames, each of 
which is representative of a different portion of said video, (4) and information regarding interrelationships between 
45 said plurality of said frames includes the identification of a plurality of sequential frames of said video that represent 
at least one of a shot and a scene. 

5. The method of claim 1 or 2 wherein said program description scheme (1 8) includes fields for storing at least one of 
a color profile of at least a portion of said video, a texture profile of at least a portion of said video, a shape profile 

so of at least a portion of said video, and a motion profile of at least a portion of said video. 

6. The method of claim 1 or 2 wherein said program description scheme (18) identifies a second audio track separate 
from the normal audio track of said video. 

55 7. The method of claim 1 or 2 wherein said program description scheme (18) identifies Internet based information 
related to said video. 

8. The method of claim 1 or 2 wherein said program description scheme (1 8) includes syntactic structure of a plurality 
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of frames of said video. 

9. The method of daim 1 or 2 wherein said program description scheme (18) includes semantic structure regarding a 
plurality of said frames of said video. 

1 0. The method of daim 1 or 2 wherein information tor said program description scheme (1 8) is extracted from the con- 
tent of a video itself. 

1 1 . The method of claim 1 or 2 further comprising the step of generating a summary of said video of a user determined 
duration based upon said information of said program description scheme (18). 

12. The method of claim 1 or 2 further comprising the steps of: 

(a) generating at least one of summary and key frame information of said video based upon the content of said 

video; and . 

(b) including said at least one of said summary and said key frame information in said program description 

scheme (1 8). 

13. The method of claim 1 or 2 wherein said user description scheme (20) contains information related to said user's 
viewing history. 

14. The method of claim 1 or 2 wherein said information (14) related to said user includes at least one of geographic 
information and demographic information. 

15. The method of claim 1 or 2 wherein said user description scheme (20) contains a user's personal preferences. 

1 6. The method of claim 1 or 2 wherein said user description scheme (20) is contained in a handheld electronic device. 

1 7. The method of claim 1 or 2 wherein said user description scheme (20) contains at least one of preselected frequen- 
cies and preselected stations for radio broadcasts. 

18. The method of claim 1 or 2 wherein said system description scheme (22) includes capabilities of said device (12) 
for providing said at least one of said audio, said image, and said video to said user. 

1 9 The method of claim 1 or 2 wherein said video is received from at least one of broadcast television, cable television, 
satellite television, digital television, Internet broadcasts, world wide web, digital video discs, still images, video 
cameras, laser discs, magnetic media, computer hard drive, video tape, data services, and microwave communi- 
cations. 

20 The method of claim 1 or 2 wherein said program description scheme (18) is received from at least one of 
PSIP/DVB-Si information in digital television broadcasts, specialized digital television data services, specialized 
Internet services, data file, data over a telephone wire, and computer memory. 

21 The method of claim 1 or 2 further comprising the step of in response to receiving said video determining together 
with information within said user description scheme (20) whether to perform an analysis of the content of said 
video. 

22. The method of claim 1 or 2 further comprising the steps of: 

(a) extracting information contained in at least one of the system description scheme (22) and the program 
description scheme (18); and 

(b) modifying said information (14) contained in said user description scheme (20) based on upon said extract- 
ing. 

23. The method of claim 1 or 2 further comprising the steps of: 

(a) storing said user description scheme (20) on a first portable device; 

(b) interconnecting said portable device with a plurality of different second devices, each of which uses the 
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information (14) contained within said user description scheme (20). 

24. The method of ciaim 1 or 2 wherein said program description scheme (18) is included within a camera and the sys- 
tem modifies said information contained within said camera based on, at least in part, said information of said user 

5 description scheme (20) and said information of said system description scheme (22). 

25. The method of claim 1 or 2 further comprising a search device to identify video based on, at least in part, said infor- 
mation (10) of said program description scheme (18) and said information (14) of said user description scheme 
(20). 

10 

26. The method of claim 1 or 2 further comprising a recording device that records video broadcasts that may be desir- 
able to said user based upon said information (10) contained within said program description scheme (18) and said 
information (14) contained within said user description scheme (20). 

is 27. The method of claim 1 or 2 further comprising the steps of: 

(a) storing at least one of said user description scheme (20), said system description scheme (22) and said 
program description scheme (18) on a first device; and 

(b) transferring at least one of said user description scheme (20), said system description scheme (22) and 
20 said program description scheme (18) to a second device through a network. 

28. The method of claim 1 or 2 further comprising the steps of: 

(a) providing the system description scheme (22) across a network to a provider of at least one of audio, image, 
25 and video data; 

(b) in response to receiving said system description scheme (22) said provider selecting said at least one of 
audio, image, and video data in accordance with said system description scheme (22); and 

(c) providing said at least one of said audio, image and video data to a device (12) for a user. 
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